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This  paper  gives  a  brief  survey  of  the  uses  of  stochastic  calculus  in  survival  analysis.  The  role 
played  by  martingale  central  limit  theory  in  deriving  asymptotic  distributions  of  estimators  and  test 
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1.  Introduction 


Beginning  with  Aalen’s  1975  thesis  from  Berkeley,  there  has  been  a  rapid  increase  in  the  use 
of  stochastic  calculus  as  a  tool  in  the  study  of  survival  analysis.  Aalen  realized  that  multivariate 
counting  processes  provide  a  natural  framework  for  the  study  of  censored  survival  data  and  that  a 
central  role  is  played  by  martingales,  predictable  processes  and  stochastic  integrals.  The  counting 
process  approach  has  been  successfully  applied  in  the  study  of  Nelson’s  cumulative  hazard  esti¬ 
mator  (Aalen,  1978),  the  Kaplan-Meier  product-limit  estimator  (Aalen  and  Johansen,  1978;  Gill, 
1980, 1983),  nonparametric  k-sample  tests  (Andersen  et  al.,  1982)  and  Cox’s  proportional  hazards 
regression  model  (Andersen  and  Gill,  1982),  to  name  just  a  few  examples. 

The  list  of  such  applications  is  now  quite  long,  and  as  new  estimators,  models  and  data 
structures  are  introduced,  they  too  can  often  be  treated  under  the  umbrella  of  the  counting  process 
approach.  Some  of  the  benefits  of  this  are:  (1)  we  can  bring  to  bear  powerful  results  from  the  theory 
of  stochastic  processes;  (2)  more  general  censoring  patterns  can  be  allowed;  (3)  i.i.d.  assumptions 
no  longer  play  a  central  role;  (4)  straightforward,  yet  rigorous,  proofs  can  be  given.  Although 
some  background  in  stochastic  processes  is  needed  to  appreciate  these  advantages,  it  turns  out  (as 
remarked  by  Arjas,  1985)  that  the  er-fields  and  martingales  involved  are  far  more  concrete  and 
practical  than  the  traditional  appoach  via  elementary  mathematics. 

The  purpose  of  the  present  paper  is  to  outline  these  developments  with  special  emphasis  on 
some  of  the  recent  applications  of  Rebolledo’s  martingale  central  limit  theorem  to  the  study  of 
asymptotic  distributions  of  estimators  and  test  statistics.  We  shall  make  no  attempt  to  review 
the  stochastic  calculus  relevant  to  survival  analysis,  but  rather  refer  to  the  books  of  Liptser  and 
Shiryayev  (1978),  Elliott  (1982)  and  Kopp  (1984)  as  necessary.  Earlier  survey  articles  and  books 
containing  material  on  the  use  of  counting  process  theory  in  survival  analysis  have  been  written 
by  Gill  (1980),  Jacobsen  (1982),  Davis  (1983),  Andersen  and  Borgan  (1985),  Shorack  and  Wellner 
(1986),  Karr  (1986)  and  Prakasa  Rao  (1987).  We  also  mention  the  forthcoming  book  of  Andersen, 
Borgan,  Gill  and  Keiding  (1988). 


2.  The  Nelson- Aalen  and  Kaplan-Meier  estimators 


2.1.  Cumulative  hazard  and  survival  functions 


Let  T,  representing  the  survival  time  of  an  individual,  be  a  positive  random  variable  with 
distribution  function  F  having  a  density  /.  The  hazard  function  (or  failure  rate  function)  of  T 
is  defined  by  X(t)  =  f{t)/S{t)  for  t  such  that  S(t)  —  1  -  F(t)  >  0.  Here  5  is  called  the  survival 
function.  Since 

=  P[t  <  T  <  t  +  At|T  >  1) 
v  '  Atio  At 

we  may  interpret  X(t)At  as  the  probability  of  failure  in  the  time  interval  t  to  t  -f  At  given  survival 


up  to  time  t.  It  is  easily  seen  that  the  hazard  function  determines  the  distribution.  For 


A(t)  =  -^logS(0, 


so  that,  in  terms  of  the  cumulative  hazard  function  A (t)  =  A(s)  da,  we  have  S(t)  =  e~Ai*). 

When  F  does  not  have  a  density,  the  hazard  function  cannot  be  defined,  but  the  cumulative 
hazard  function  is  defined  by 


A(«)  =  f 

o.t 


dF(u) 

>.t]  S( u-) 

for  t  such  that  5(t-)  =  limuf«  5(u)  >  0.  The  survival  function  can  be  represented  in  terms  of  the 
cumulative  hazard  function  by  S  =  £(—A),  where  £  is  the  Dolmans- Dade  exponential  defined  by 
the  following  result. 


Theorem  2.1.  (A  special  case  of  Dolmans- Dade  (1970))  Let  X  be  a  right-continuous  function  of 
bounded  variation  with  X0  =  0.  Then  the  equation 

Zt  =  1  +  f  Z.  _  dXt 

has  a  unique  solution  which  is  bounded  on  finite  intervals.  The  solution,  denoted  £  (X)t ,  is  given 
by  Zt  equal  to 

nx)t=e*n(i+Ax.), 

I<( 

where  Xf  =  Xt  -  E.<<  AX.;  AX.  =  X.  -  X._. 

A  proof  of  this  result  can  be  found  in  Liptser  and  Shiryayev  (1978,  pp.255-256).  As  an 
immediate  consequence  (see  Wellner,  1985)  we  obtain  the  formula 

S  =  £(-  A),  (2.1) 


since  S  satisfies  the  equation 

5(f)  =  l+  /  5(u-)rf(-A(u)). 

An  alternative  way  of  representing  S  in  terms  of  A  is  to  use  product  integration.  Let  X  be  a  finite 
Borel  measure  on  the  positive  real  line  and  denote  its  distribution  function  by  the  same  symbol: 
X(0  =  X((0,t]).  The  product  integral  of  X  is  defined  by 

1  +  dX)  =  lim  n^  +  X^-r,*.))), 

(M 

where  0  =  to  <  ti  <  ...  <  tn  =  t  is  a  partition  of  (0,t].  It  can  be  shown  (see  Gill  and  Johansen, 
1988)  that 


2.2.  Random  censorship  models  and  the  counting  process  formulation 


Censoring  is  a  general  phenomenon  which  affects  many  types  of  data.  In  survival  data  it 
typically  takes  the  form  of  random  right  or  left  censorship.  For  example,  consider  a  study  for 
determining  the  age  T  at  which  a  certain  chronic  disease  or  other  permanent  condition  appears 
in  an  individual.  Right  censoring  occurs  if  the  individual  dies  or  leaves  the  study  before  the 
disease  appears.  However,  the  disease  may  have  already  appeared  before  the  individual  entered 
the  study,  and  this  results  in  left  censoring.  In  each  case,  the  exact  value  of  T  cannot  be  determined, 
but  some  useful  information  is  still  available.  One  of  the  most  elegant  features  of  the  counting 
process  formulation  is  that  it  unifies  these  and  more  general  censoring  schemes  under  the  notion  of 
“predictable  censoring”.  Estimators  of  the  survival  function  based  on  right-censored  data  (Kaplan 
and  Meier,  1958),  left-censored  data  (Woodroofe,  1985)  or  doubly-censored  data  (Chang  and  Yang, 
1987)  can  then  be  studied  in  a  unified  way. 

Define  the  basic  counting  process  Nf  =  I(T  <  t ),  In  the  absence  of  censoring,  ATt*  is  the  only 
counting  process  we  need  to  consider.  To  introduce  censoring  into  the  picture,  let  ( Ct ,  t  >  0)  be 
a  {0,  l}-valued  stochastic  process,  called  the  censoring  process,  which  is  understood  to  indicate 
censorship  at  time  t  if  Ct  =  0.  The  observed  counting  process  is  given  by 


Nt  =  fc.dN;  =  {  *  *  jT  <  t  and  CT  -  1 
J0  10  otherwise. 


Next,  define  7tT  =  s  <  <),  the  cr-field  generated  by  Nm  up  to  time  t.  Let  (§t)  be  a  right- 

continuous  filtration  such  that  Qt  is  independent  of  T  for  all  t  and  write  7t~  7?  V  Qt,  the  (7-field 
generated  by  7?  and  Qt.  Also  define  the  processes  Rt  =  7(T  >  t)  and  Yt  =  CtRt,  so  that  Yt  is  the 
indicator  that  the  individual  is  observed  to  be  at  risk  at  time  t.  The  following  result  is  crucial. 

Theorem  2.2.  Suppose  that  the  censoring  process  (C*)  is  predictable  with  respect  to  the  filtration 
( 7t ).  Then  the  process 

Mt  =  Nt-  f  Y.  dA.  (2.3) 

Jo 

is  an  ^-martingale. 

In  the  language  of  counting  processes  this  result  is  saying  that  the  counting  process  Nt  has 
compensator  At  =  Jq  Y,  dA,  with  respect  to  the  filtration  (7t),  see  Liptser  and  Shiryayev  (1978,  p. 
239).  If  T  has  a  hazard  function  A(t),  then  the  result  implies  that  Nt  has  intensity  process  YtA(t) 
and  we  may  write  (2.3)  in  the  form  of  a  stochastic  differential  equation: 

dNt  =  YtX{t)  dt  +  dMt. 

The  result  is  intuitively  reasonable  in  view  of  the  interpretation  of  YtA(t)  dt  as  the  probability  of 
observing  a  failure  in  the  time  interval  t  to  t  +  dt  given  survival  up  to  time  t. 

The  censoring  process  (C<)  is  left-continuous  in  most  applications,  so  in  order  to  show  that  it 
is  predictable,  it  suffices  to  show  that  it  is  (^)-adapted.  In  particular,  the  usual  random  right  or 
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left  censorship  schemes  are  obtained  by  taking  C%  =  I(L  <  t  <  C),  where  L  and  C  are  the  left 
and  right  “censoring  times”  respectively,  assuming  that  ( L,C )  is  independent  of  T,  and  setting 
St  —  So  —  &(L,C).  Note,  however,  that  the  defintion  of  predictable  censoring  is  much  more 
general  than  this. 

Proof  of  Theorem  2.2.  The  main  step  in  the  proof  is  to  show  that  Mt*  =  AT,*  -  f*  R,dA,  is 
an  ^-martingale.  Then,  since  Qt  is  independent  of  7? ,  it  follows  from  a  standard  result  on 
conditional  expectation  (see  Chung,  1974,  p.3Q8)  that  A/t*  is  an  ^-martingale.  Consequently,  since 
Mt  =  f*  C,dM *  can  be  interpreted  as  a  stochastic  integral  (see  Kopp,  1984,  p.149),  Mt  is  an 
/t-martinga'i.  Now  to  show  that  Mt*  is  an  martingale,  let  u  <  t  and  note  that 

E(Mt-  -  m:\7J)  =  E(I(T  >  u)(m;  -  ao  \7j)  =  j(r  >  u)e(m;  -  mu*  |t  >  u) 

since  A  n  {T  >  u}  is  either  the  empty  set  or  { T  >  u}  for  any  A  €  7* .  Also 


e  (n;  -n:\t>u)  = 


and  with  At  =  R.  dk, , 

E(At  -  AU\T  >  u)  =  E{At  -  AU\T  >  t)P(T  >  t|T  >  u) 

-f  E(At  -  Au\u  <T<  t)P(T  <  t\T  >  u) 

,,.,*(0  ,  f  (A(»)  -  A(«»  (F(l)  -  F(u)) 

-  <At‘>  aw)sm + y(Ml  -rorm  «  — sM 

=4oLs(v"),,aw 
= -V = b{n*  ~  kit  >  u)> 

where  we  have  used  the  integration  by  parts  formula  for  Stieltjes  integrals  (see  Liptser  and 
Shiryayev,  1978,  p.253).  This  completes  the  proof. 

The  basic  idea  in  the  above  proof  is  that  the  martingale  property  of  (Mf,7tT)  is  preserved 
when  independent  events  are  added  to  7? .  It  is  natural  to  ask  “How  much  can  we  enlarge  7? , 
while  preserving  the  martingale  property  of  Af/  ?”  We  know  of  no  general  answer  to  this  question, 
although  some  recent  work  of  Jacobsen  (1986,  1988)  dealing  with  right-censoring  is  of  interest  in 
this  regard. 

To  extend  the  counting  process  formulation  to  n  individuals  with  corresponding  survival  times 
Ti,...,T„,  introduce  processes  N{,  Yi,  A f,,  i  =  l,...,n  and  filiations  (7a),  i  =  l,...,n  having  the 
same  structure  as  N,  Y,  M  and  (7t).  It  is  convenient  to  view  the  martingales  Mi,  i  =  1,  ...,n  with 
respect  to  the  same  filtration.  If  7u,...,7nt  are  independent  a-fields  for  all  t,  then  M\,...,Mn  are 
martingales  (in  fact  orthogonal  martingales)  with  respect  to  the  filtration  7}n^  =  7u  V  •••  V  7nt- 
For  some  applications,  however,  it  may  be  too  restrictive  to  assume  that  the  7,t  are  independent. 
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The  general  counting  process  formulation  does  not  require  any  such  assumption;  it  only  requires 
that  are  martingales  with  respect  to  some  filtration  ( j/n^). 

Aalen’s  (1978)  multiplicative  intensity  model  can  now  be  formulated  as  follows.  Let  N(t)  = 
(Ni  (t), ...,  Nn(t)y,  t  €  [0, 1]  be  a  multivariate  counting  process  with  respect  to  a  right-continuous  fil¬ 
tration  i.e.  N  is  adapted  to  the  filtration  and  has  components  N{  which  are  right-continuous 

step  functions,  zero  at  time  zero,  with  jumps  of  size  +1  such  that  no  two  components  jump  simul¬ 

taneously.  Suppose  that  the  counting  process  Ni  has  intensity 

Ai(t)  =  K(t)A(t),  (2.4) 

where  Y{(t)  is  a  predictable  {0,  l}-valued  process  and  A  is  a  fixed  hazard  function.  By  Davis  (1983, 
Proposition  4.2) 

A  i(a)ds,  i  =  l,...,n  (2.5) 

are  orthogonal  local  square  integrable  martingales  on  [0, 1]  with  predictable  variation  process 

A<(s)ds.  (2.6). 

If  E  IVt(l)  <  oo  then  Mi  is  in  fact  a  square  integrable  martingale  on  [0,  lj.  As  Aalen  does,  we  shall 
assume  throughout  that  M,-,  i  =  l,...,n  are  square  integrable  martingales. 


(Mi ,  M, 


».  =  /' 


Mi(t)  =  Ni(t)~  [ 
Jo 


2.S.  The  Nelson- Aalen  estimator 


An  estimator  for  the  cumulative  hazard  function  A  was  introduced  by  Nelson  (1969)  in  the  case 
of  right-censored  survival  data,  and  extended  by  Aalen  (1975,  1978)  to  his  multiplicative  intensity 
model.  The  so  called  Nelson-Aalen  estimator  is  defined  by 


dN^(s) 
Y <")(«)  ’ 


where  =  53™=  i  Ni,  Y^  =  53”=i  and  1/0  =  0.  A  motivation  for  this  estimator  is  provided 
by  formally  solving  the  stochastic  differential  equation 


dNt(n)  =  Yt(n)dAt  +  dMt(n)  (2.7) 

(where  M =  53”=  l  M)  for  At  and  ignoring  the  “noise”  term  f*  (which  is  a  martingale). 

The  asymptotic  distribution  of  A  can  be  derived  under  the  following  asymptotic  stability 
condition: 

(AS)  There  exists  a  function  p,  bounded  away  from  zero  on  [0,1],  such  that  y(nl(t)  =  Ay(n)(t) 
satisfies 

sup  |?M(t)-p(t)|40  as  n  — *  OO.  ' 

ttio,:] 


5 


Note  that  in  the  i.i.d.  case,  in  which  Vi,  t  =  l,...,n  are  i.i.d.  replicates  of  one  another  and  Yi 
has  left-continuous  paths  with  right  hand  limits,  the  asymptotic  stability  condition  (AS)  can  be 
checked  using  Ranga  Rao’s  (1963)  law  of  large  numbers.  Let  D[0, 1]  denote  the  space  of  functions 
on  [0, 1]  which  are  right-continuous  on  (0, 1)  with  left  limits  on  (0, 1],  and  equip  it  with  the  Skorohod 
topology  (see  Billingsley,  1968,  p.lll).  Convergence  in  distribution  will  be  denoted 

Theorem  2.3.  (Aalen,  1978)  Suppose  that  the  asymptotic  stability  condition  (AS)  holds.  Then, 
under  Aalen’s  multiplicative  intensity  model,  y/n[k  —  A)—*m  in  £)[0, 1]  as  n  — ♦  oo,  where  m  is  a 
continuous  Gaussian  martingale  with  covariance  function 


„  ,  .  A(u) 

Cov(m,,mt)  =  /  ~T-\d  u. 

Jo  PW 


Proof.  Using  (2.7)  we  have 


where 


= ds  /’ W’  ** = 7(SM",(,)  - 0) 


‘  y/nj0  P<»)(.)  ’  V  y0  V 

Since  the  condition  (AS)  implies  that 

sup  li^O,  (2.9) 

(€[0,1] 

to  complete  the  proof  it  suffices  to  show  that  in  D  [0,1].  Note  that  Mfn^  is  a  square 

integrable  ^"'-martingale.  Now  apply  the  version  of  Rebolledo’s  (1980)  martingale  central  limit 
theorem  stated  in  Andersen  and  Gill  (1982)  with  p  =  1  and  #{"'(<)  =n~ £  (fW(t))-1,  By  (2.5), 
(2.6)  and  (AS)  we  have 


(2.10) 


The  Lindeberg  condition,  here  given  by 


it/0 

for  all  e  >  0,  follows  from  (AS).  This  completes  the  proof. 

It  is  possible  to  use  Theorem  2.3  to  construct  confidence  bands  for  A,  see  Andersen  and 
Borgan  (1985,  p.114).  In  many  applications  it  is  of  interest  to  estimate  the  hazard  function  A 
itself.  It  is  possible  to  develop  an  asymptotic  distribution  theory  for  pointwise  estimators  of  A(<), 
A (t)  aay,  using  Rebolledo’s  martingale  central  limit  theorem,  much  as  in  the  proof  of  Theorem  2.3. 
This  has  been  done  for  kernel  estimators  (Ramlau-Hansen,  1983),  spline  sieve  estimators  (Karr, 


v-.-y 


1987),  grouped  data  based  estimators  (Borgan  and  Ramlau-Hansen,  1985;  McKeague,  1988b) 
and  penalized  maximum  likelihood  estimators  (Antoniadis,  1987).  Integrating  any  one  of  these 
estimators  provides  another  estimator  /0  A(s)  ds  of  A,  which  (not  surprisingly)  turns  out  to  have 
the  same  asymptotic  distribution  as  the  Nelson- Aalen  estimator. 

2.4.  The  Kaplan-Meier  estimator 

In  view  of  (2.1)  and  (2.2)  it  is  reasonable  to  estimate  the  survival  function  5  by  the  Doleans- 

*  A 

Dade  exponential  or  product  integral  of  -A,  where  A  is  the  Nelson-Aalen  estimator.  Define 

s(t)=£(-A)1  =  na-^). 

(0.<| 

Since  the  continuous  part  of  A  is  zero,  5  reduces  to  the  so  called  “product-limit”  estimator 

<<i 

which  was  originally  introduced  by  Kaplan  and  Meier  (1958)  in  the  case  of  right-censored  survival 
data. 

Breslow  and  Crowley  (1974)  gave  the  first  proof  of  weak  convergence  of  the  Kaplan-Meier 
estimator;  Gill  (1980,  1983)  gave  a  proof  based  on  martingale  methods.  We  shall  present  Gill’s 
proof  in  the  context  of  the  multiplicative  intensity  model.  In  the  classical  i.i.d.  random  censorship 
model  other  proofs  are  possible.  Gill  and  Johansen  (1988)  recently  gave  a  proof  using  Hadamard 
differentiability  and  a  functional  version  of  the  delta  method.  That  approach  works  for  general 
(not  necessarily  continuous)  survival  functions  and  can  be  used  to  study  the  asymptotic  behavior 
of  the  bootstrapped  Kaplan-Meier  estimator;  see  Gill  (1987).  David  Pollard  (1988)  has  informed 
us  that  a  proof  via  the  theory  of  empirical  processes  is  also  possible.  We  refer  the  reader  to  Akritas 
(1986),  HorvAth  and  Yandell  (1987)  and  Lo  and  Singh  (1986)  for  results  on  the  bootstrapped 
Kaplan-Meier  estimator. 

Theorem  2.4.  Suppose  that  the  asymptotic  stability  condition  (AS)  holds.  Then,  under  Aalen ’s 
multiplicative  intensity  model,  y/n(S  -  S)— *S(-)  m(-)  in  Z)[0, 1]  as  n  — ♦  00,  where  m  is  the  Gaussian 
martingale  of  Theorem  2.3. 

Proof.  First  note  that  £(A  -  A)t  =  £(A)*£ ( — A)*  =  St/St,  so,  by  Theorem  2.1, 


fL  =  1+  /  ^pd(A  -  A)(u). 
J  0 


Thus,  using  (2.8),  we  obtain 


y/n(St  -  St )  =  -St  f  +  R<n), 

Jo 


(2.11) 


v. 
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where  -a  a  remainder  term  (different  from  the  original  satisfying  (2.9).  To  complete  the 
proof,  it  suffices  to  show  that  in  D[0, 1],  where 


Now  mf”)  is  a  square  integrable  martingale  with  predictable  variation  process 


Thus,  by  (AS),  we  have  (m^n^,mf"^)i  =  Op(  1).  Using  Lenglart’s  (1977)  inequality  and  (2.11)  it 

VN  p 

follows  that  sup1€[0  j|  \St  -  S<|-+0.  Hence 


(m(n),m(n))t£  /  du  =  (m,m)t. 

Jo  Plu) 


The  Lindeberg  condition  for  is  checked  in  the  same  way  it  was  checked  for  M^nK  The  result 
follows  by  Rebolledo's  martingale  central  limit  theorem. 

It  is  natural  to  ask  whether  the  above  results  have  any  extension  to  two-dimensional  survival 
times,  T  =  (Ti,T3)  say.  Data  of  that  kind  can  arise,  for  example,  in  a  study  of  the  ages  Ti,  T3  at 
which  two  different  chronic  diseases  appear  in  an  individual.  Unfortunately,  many  of  the  techniques 
that  are  useful  in  the  univariate  case  are  no  longer  applicable  in  the  bivariate  case.  In  particular, 
a  two-parameter  martingale  central  limit  theorem  is  not  available.  The  best  results  that  are 
currently  available  are  all  for  the  i.i.d.  case  with  right  censoring  and  rely  on  classical  methods;  see 
Tsai,  Leurgans  and  Crowley  (1986)  for  instance. 


3.  Regression  models  for  survival  data 

In  most  applications  of  survival  analysis  it  is  important  to  consider  the  effects  that  covariates 
may  have  upon  the  survival  times  of  individuals  in  the  study.  This  can  be  done  by  using  a  regression 
model  for  the  conditional  hazard  function  \{t,z)  =  A(l|z)  of  the  survival  time  of  an  individual  who 
has  a  covariate  vector  e  -  ( zu...,zp )',  say,  at  time  t.  The  well  known  proportional  hazards  model 
of  Cox  (1972)  has  been  the  most  popular  model,  but  in  recent  years  other  models  have  begun  to  be 
considered.  We  list  Cox’s  model  and  various  other  alternative  nonparametric  and  semiparametric 
models  with  which  we  are  familiar  as  follows. 

(1)  Cox’s  (1972)  proportional  hazards  model: 


Ht,z)  =  Xo(t)  t?**. 


■m 


where  Ao  is  an  unknown  baseline  hazard  function  and  /9o  is  a  vector  of  p  unknown  parameters. 

(2)  Aalen’s  (1980)  additive  risk  model: 

p 

A(M)  = 

j=i 

where  ai,...,ap  are  unknown  functions. 

(3)  The  general  nonparametric  model.  Beran  (1981)  considered 

A(t,z)  is  arbitrary. 

(4)  Variations  on  Cox’s  model.  The  general  proportional  hazards  model 

A(t,z)  =  A0(<)r(z), 

where  A0  is  an  unknown  baseline  hazard  function  and  r  is  an  unknown  “relative  risk”  func¬ 
tion  was  proposed  by  Thomas  (1983).  Hastie  and  Tibshirani  (1987)  suggested  the  generalized 
additive  model  r(z)  =  ry(zy),  where  ri,...,rp  are  unknown  functions.  Prentice  and  Self 

(1983)  take  r(z)  =  t0(P'0z),  where  r0  is  known  and  (3q  is  a  vector  of  p  unknown  parameters. 
Zucker  (1986)  and  Zucker  and  Karr  (1987)  generalized  Cox’s  model  by  allowing  /?0  to  be  time- 
dependent. 

Since  the  papers  of  Andersen  and  Gill  (1982)  and  Aalen  (1980),  which  developed  asymptotic 
theory  for  the  models  (l)  and  (2),  martingale  methods  have  been  used  to  obtain  asymptotic  theory 
for  most  of  these  models.  In  this  section  we  review  some  of  that  work.  Throughout  we  shall  use 
the  following  counting  process  framework,  extending  the  multiplicative  intensity  model  to  allow 
for  covariates.  Suppose  that  N(t)  =  (Ni (t),...,Nn(t))'  is  a  multivariate  counting  process  with 
respect  to  a  right-continuous  filtration  (7t^).  The  counting  process  Nt,  which  records  events  in 
the  life  of  the  *th  individual,  is  assumed  to  have  intensity  process  A,  (t)  =  Y{(t)  X(t,Z,(t)),  where 
Yi(t)  is  a  predictable  {0,  l}-valued  process  as  before,  and  Z{(t)  =  (Zu(t),  ...,  Z,p(f))'  is  a  p- vector 
of  predictable  covariate  processes.  The  martingales  are  again  defined  by  (2.5).  For 

simplicity,  we  shall  only  consider  the  i.i.d.  case  in  which  (Ni,Yi,Zi),  i  =  l,...,n  are  i.i.d.  replicates 
of  ( N,Y ,  Z).  Also,  assume  that  Y  and  Z  are  left-continuous  with  right  hand  limits  and  the  covariate 
processes  are  bounded. 


S.l.  Cox’s  proportional  hazards  model 

In  this  section  we  shall  briefly  sketch  the  main  results  of  Andersen  and  Gill  (1982).  We  refer 
to  the  review  paper  of  Davis  (1983)  for  an  informal  discussion  of  these  results  and  the  motivation 
behind  the  estimators.  For  simplicity  of  presentation,  we  shall  assume  that  the  covariates  are  scalar 
valued  (p  =  1). 


m 


Cox  (1972, 1975)  proposed  that  inference  for  Pa  in  (1)  be  based  on  the  partial  likelihood  function 

"  (  ffiZAT.)  sS, 
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where  6i  and  T,-  are  the  indi-ator  of  noncensorship  and  the  survival  time  for  the  tth  individual 
respectively,  and  £<  is  the  “risk  set”  consisting  of  all  individuals  which  are  observed  to  be  at  risk 
at  time  T<.  Let  P  be  the  value  that  maximizes  L{0).  In  terms  of  the  underlying  counting  processes, 
the  estimate  P  is  the  unique  solution  to  log  L(P)  =U(P,  1)  =  0,  where 

«W"‘>  =  tfj*  M  -  <31> 

sW(p,t)  =  sw,  (3.2) 

n  t=i 

for  j  =  0, 1,2,  where  0°  =  1.  The  following  theorem  gives  the  asymptotic  distribution  of  p.  Define 

s^(P,t)  =  ES^(P,t),  e  =  s<1V«(0),  .;  =  *<2V*(0)-*2and  £  =  ['  v(p0,t)  a^(p0,t)  M*)dt. 

Jo 

Theorem  3.1.  Suppose  that  A0  is  integrable  over  (0,1],  *^(v)  “  bounded  away  from  0  in  a 
neighborhood  of  Pa,  and  E  >  0.  Then  >/n(/3  —  /io)—*N  (0,  E-1). 

Proof.  (Sketch)  By  the  mean  value  theorem 

U  0, 1)  -  tf(A>,  1)  =  -  I{P*  ,l)0~  Po),  (33) 


where  p*  lies  between  Pa  and  P,  and 


W.O  -  -IfVA  -  £ {fi^j  •  QW"WW 

where  =  SI=i  But  E0,1)  —  0,  so  from  (3.3)  we  obtain 

rrAn  o  i  n~*U{P0,l) 

The  key  step  in  the  proof  is  to  see  that  U(Por)  is  a  martingale: 

rr,a  ,\  _  V.  /  »  f..\  S(1,($0,u)\  . 


Let  mi  be  a  continuous  Gaussian  martingale  with  variation  process 


(mi)t  =  f  v(A>,u)s(0)(£o,u)Ao(u)<*u. 
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Then 


so  by  Rebolledo’s  martingale  central  limit  theorem  n- j  U((30,-)^->mi  in  Z?[0,  l].  Consequently, 
n~$U(0o,  l)—*N  (0,E)  and  from  (3.4),  to  complete  the  proof  it  suffices  to  show  that  n-1 1(0*,  1) 

p 

— »E.  This  is  done  in  Andersen  and  Gill  (1982,  p.1108),  but  it  is  to  be  expected  because  they  show 

A  p  p 

that  (3 — */?o.  so  /?*—►$>>  and  we  can  write  n-1  J(/Sq,1)  in  the  form 


£{sW(fl0,u) 


(S<l>(fi o,«))a 

o,u) 


frswy o,u) 

Jo  iSl°)(fi o,«) 


0,u))2 

V  S(°)(A>,«) 


)a}rfAfW(ti), 


where  M*n)  =  ]C"=i  M*  The  first  term  above  tends  in  probabilty  to  S  and  the  second  term  tends 
in  probabilty  to  zero,  by  Lenglart’s  inequality. 


Breslow  (1972,  1974)  suggested  that  the  cumulative  baseline  hazard  function  Ao(t)  = 

Jo  Ao  (s)ds  could  be  estimated  by  a  piecewise  linear  approximation  to  the  Nelson- Aalen  type  esti¬ 
mator 

U  k  nSW0,v.)' 


Theorem  3.2.  Under  the  conditions  of  Theorem  3.1,  y/n(A  -  Ao)-^mo(-)  +  0(-)  mi  (1)  in  D fO,  1], 
where  mo  and  mi  are  independent  zero  mean  Gaussian  martingales,  (mjt  is  defined  by  (3.5  J, 

(mo}t  =  du  and  0(t)  =  S_1  e(A>, u)  Ao(u)  du. 

S.2.  Aden’s  additive  risk  model 

In  some  applications,  additive  risk  models  are  more  appropriate  than  proportional  hazards 
models.  However,  although  parametric  additive  risk  models  have  been  used  in  survival  analysis 
(especially  in  epidemiology)  for  many  years  (see  the  references  in  Breslow,  1986;  Muirhead  and 
Darby,  1987),  the  nonparametric  additive  risk  model  (2)  has  only  been  studied  recently  (Aalen, 
1980;  McKeague,  1986,  1988a,  1988b;  Huffer  and  McKeague,  1988). 

Let  a  =  (oi,  ...,ap)'  and  denote  Yi,(t)  =Yi{t)  Zij[t),  A(t)  =  J*o  a(s)  ds  for  fixed  to.  0  <  t0  <  1. 
Aalen  (1980)  proposed  estimators  A  of  A  of  the  form  A(t)  =  f*o  Y~(s)  dN(s),  where  Y~(s)  is 
a  predictable  generalized  inverse  of  the  n  x  p  matrix  y(s)  =  (Yiy(s)).  In  the  case  p  =  1,  with 
(y~(s)) =  (Sfc=i Tfci(s))-1.  *  =  1,  n,  A  is  the  Nelson-Aalen  estimator.  For  p  >  1,  Aalen 
suggested  using  Y~(s )  =  (Y,(s)Y(s))~1Y,(s),  where  here  and  in  the  sequel,  for  any  square  matrix 
(or  scalar)  D,  D~l  denotes  the  inverse  of  D  if  D  is  invertible,  the  zero  matrix  otherwise.  Aalen 
observed  that  this  choice  of  Y~  can  be  motivated  by  a  formal  least  squares  principle  and  that 


the  resulting  estimator  A(t)  =  J^0(Y,(s)  Y (a))-1K'(s)  dN(s),  referred  to  as  Aalen’s  least  squares 
estimator,  probably  gives  reasonable  but  not  optimal  estimates  of  A.  Huffer  and  McKeague  (1988) 
proposed  using  the  following  generalized  inverse  of  V(s): 

r-(s)  =  (!"(•)*«  ^M)"1^)  W(»),  (3.6) 

where  W(t)  is  the  n  x  n  diagonal  matrix  with  ith  diagonal  entry  W,(f)=  (A »(t))-1,  and 

*<(<)  =  £>;(*)*,(*),  (3.7) 

3=1 

where  dy  is  a  predictable  estimator  of  ay.  The  estimator  dy  is  taken  to  be  the  jth  component  of 
the  smoothed  least  squares  estimator 

£*?-&*** 

where  K  is  a  left-continuous  bounded  kernel  function  having  integral  1,  support  [0,1]  and  bn  >  0 
is  a  bandwidth  parameter.  The  choice  of  generalized  inverse  (3.6)  defines  the  so  called  weighted 
least  squares  estimator 

>4(0  =  /Vw  w'M  ^(o-  (3.8) 

Jt0 

In  the  case  of  a  single  covariate  the  weighted  least  squares  estimator  coincides  with  the  Nelson- 
Aalen  estimator. 

A  heuristic  explanation  for  the  choice  of  weight  matrix  W  (t)  is  as  follows.  By  conditioning  on 
the  past  7^  we  may  interpret  the  stochastic  differential  equation  dN(t)  —Y(t)a[t)  dt  +  dM(t), 
increment  by  increment,  as  standard  linear  regression  model  with  heteroscedastic  errors  dM(t), 
where  M  =  (Afi, ...,  Afn)'.  We  should  choose  the  weight  matrix  to  be  proportional  to  the  inverse 
of  the  error  covariance  matrix  Cov(dM(t)|^n^)  =  the  n  x  n  diagonal  matrix  with  tth  diagonal 
entry  A i(t)dt.  However,  A,(t)  depends  on  the  unknown  a(t).  Estimating  A,(t)  by  (3.7),  where  the 
estimator  a(t )  only  depends  on  the  past  (since  the  kernel  K  has  support  [0, 1]),  leads  to  W(t). 

The  following  result,  which  gives  the  asymptotic  distribution  of  A,  is  a  special  case  of  Theorem 
3.2  of  McKeague  (1988a).  Let  L(t)  and  V (t)  denote  the  p  x  p  matrices  with  entries  Lyk(f)  = 
EYli(t)Ylk(t)  and  Vy*(t)  =  EYij(t)Yuc[t) Aj  *(t),  respectively,  and  let  Z?[t0,l]p  denote  the  prod¬ 
uct  of  p  copies  of  the  Skorohod  space  Z?[to,  1]. 

Theorem  3.3.  Suppose  that  ai,...,ap,  £>(•),  V(>)  are  continuous,  L[t)  and  V(t)  are  nonsingular  for 
all  t  G  [0, 1],  A(t,  Zt)  is  bounded  away  from  zero,  bn  — *■  0,  — »  oo,  and  the  kernel  function  K  has 

bounded  variation.  Let  0  <  to  <  1-  Then,  under  Aalen’s  additive  risk  model,  \/n(A  -  A)^*m  in 
Z?[to,l]p,  where  m  is  a  p- variate  continuous  Gaussian  martingale  with  mean  zero  and  covariance 
function 

Cov(my(t),mfc(f))=  f  (V~1(s))]k  ds. 

Jto 


S.S.  The  general  nonparametric  model 


The  fully  nonparametric  model  (3)  was  first  studied  by  Beran  (1981).  It  can  be  applied  success¬ 
fully  only  when  the  sample  size  is  very  large  and  there  are  a  small  number  of  covariates.  Inference 
for  this  model  has  been  studied  further  by  Doksum  and  Yandell  (1982),  Dabrowska  (1987a,  1987b), 
McKeague  and  Utikal  (1987,  1988a)  and  Cheng  (1987).  In  this  section  we  discuss  the  main  result 
of  McKeague  and  Utikal  (1988a)  who  introduced  an  estimator  for  the  “doubly"  cumulative  hazard 
function  A(t,  z)  =  f*  f*  X(s,x)  dsdx,  (t,  z)  G  [0,1]2.  This  estimator  turns  out  to  be  important  in 
the  development  of  goodness-of-fit  tests  for  specific  regression  models.  For  simplicity  we  shall  only 
consider  the  case  of  a  single  covariate  (p  =  1). 

Let  Ir,  r  =  1,  ...,dn  be  a  partition  of  the  unit  interval,  where  Jr  =  \zT-\,  zr),  zr  =  r/dn  and  dn 
is  an  increasing  sequence  of  positive  integers.  Let  Nir(t)  be  the  counting  process  which  registers  the 
jumps  of  Ni(t)  when  Z,(t)  €  Jr,  so  that  Nir(t)  =f*  I{Zi(a)  €  Jr}dJV,(s).  Beran  (1981)  suggested 
that  the  cumulative  conditional  hazard  function  A(t,z)  =  f*  X{s,z)ds  could  be  estimated  by  the 
Nelson- Aalen  type  estimator 


dN<n)(a) 
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for  z  G  Ir, 


and  that  the  conditional  survival  function  5(t|z)  =  e  AU.*)  could  be  estimated  by  the  product-limit 
estimator 

5(t|z)  =  !!(!- AA(s,z)), 

i<( 

where  Yr  (a)  =  ^”=1  7{2,(s)  G  Ir}Yi(a)  and  ^»>-  Here  dn  should  tend  to  infinity 

at  a  suitable  rate  as  n  — ♦  oo.  Dabrowska  (1987a,  1987b)  obtained  weak  convergence  results  for 
such  estimators  in  the  case  of  right-censoring  and  non-time-dependent  covariate,  using  the  classical 
approach  of  Breslow  and  Crowley  (1974).  McKeague  and  Utikal  (1987),  using  the  martingale  ap¬ 
proach,  obtained  asymptotic  results  for  A  under  general  predictable  censoring  and  time-dependent 
covariates. 

McKeague  and  Utikal  (1988a)  proposed  to  estimate  A  by 


*(*>*)=[  A(t,x)dx,  (3.9) 

Jo 

and  obtained  the  following  weak  convergence  result  for  A.  Let  /0*  <f>(8,  x)  dW («,  x)  denote  a 
continuous  version  of  the  Wiener  integral  of  a  function  $  G  L2([0, l]2,  dsdx)  with  respect  to  a 
Brownian  sheet  W\  see  Wong  and  Zakai  (1974).  Suppose  that  for  each  t  G  [0,1],  the  random 
vector  (Zt,Yt)  is  absolutely  continuous  with  respect  to  the  product  of  Lebesgue  measure  on  [0,1] 
and  counting  measure,  and  denote  the  corresponding  density  by  fz(t)Y(t){z,y)-  Also,  assume  that 
fz(t)Y(t){z,  1)  is  a  positive,  continuous  function  of  (t,  z)  G  [0,  l]2.  Let  D?  denote  the  extension  of 
Skorohod  space  Z?[0, 1]  to  functions  on  [0,  l]2,  as  defined  in  Neuhaus  (1971). 


Theorem  3.4.  Suppose  that  A  is  Lipschitz,  d^/n  — ►  oo  and  dn  =  0(11^)  for  some  6  €  (§,  1).  Then 
y/n(A  -  A)-^*m  in  D?  as  n  -*  00,  where  m  =  (m(t,  z),  (i,  z )  €  [0,  l]3)  is  given  by 


*(<»*)=  /  f  y/h(a,x)dW(a,x), 
Jo  Jo 


h(»,x)  = 


A(a,  x ) 

fz(,)Y  («)(*»*)' 


Proof.  (Sketch)  It  can  be  shown  easily  that  \/n(A  —  A)  is  asymptotically  equivalent  in  distribution 
to  Af 1 ,  where 


z)  =  [*  dMrn](s) 

{  ,  )  *n  hJo  nww  ’ 

MrW(t)  =  ts:  I{Zi{8)  €  Jr}y;(s) dMi(a),  r  =  1  ,...,d„ 


r>(8) 

■n,M  ’ 


(3.10) 
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Since  M^(',z)  is  a  martingale  for  each  fixed  z,  Rebolledo’s  martingale  central  limit  theorem  can 
can  be  used  to  show  that  the  finite  dimensional  distributions  of  converge  to  those  of  m  (cf. 
the  proof  of  Theorem  2.3).  Finally,  {Af^,  n  >  1}  is  shown  to  be  tight  in  D?  by  checking  the 
moment  conditions  of  Bickel  and  Wichura  (1971). 

8.4-  The  general  proportional  hazards  model 

Tibshirani  (1984)  and  Hastie  and  Tibshirani  (1986)  considered  a  local  partial  likelihood  tech¬ 
nique  for  estimating  the  log  relative  risk  function  17 (z)  =  log  r(z)  in  the  general  proportional  hazards 
model  (4)  with  p  =  1.  O’Sullivan  (1986a,  1986b)  studied  a  penalized  partial  likelihood  estimator 
for  rj  and  established  consistency  of  that  estimator. 

McKeague  and  Utikal  (1988a)  considered  estimating  the  cumulative  relative  risk  function 
R(z)  =  fo  r(x) dx  fay  ^(*)  =  where  A  is  defined  by  (3.9).  By  Theorem  3.4  and  the 

continuous  mapping  theorem  (Billingsley,  1968)  we  obtain  that  y/n(R  -  R)^*mn,  where  mR  is  a 
continuous  Gaussian  martingale  with  covariance  function 

f*  iA*j  rl 


f*  lA*j  rl 

Cov(mii(z1),m/i(z2))  =  /  /  h{a,x)dadx, 

Jo  Jo 


provided  that  Ao  is  constrained  to  satisfy  Ao(l)  =  1  (to  ensure  identifiability).  Similarly,  the 
cumulative  baseline  hazard  function  A0  can  be  estimated  by  A(t)  =  A(t,  1).  If  R  is  constrained 
to  satisfy  JF2(1)  =  1,  then  y/n(k  —  Ao)-^m°,  where  m°  is  a  continuous  Gaussian  martingale  with 
covariance  function 

Cov(m°(ti),m0(t2))  =  /  /  h(a,x)dxds. 
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S.5.  Goodness- of -fit  tests 


There  is  an  extensive  literature  on  goodness-of-fit  tests  for  Cox's  proportional  hazards  regression 
model,  see  the  references  in  Arjas  (1988).  Recently,  McKeague  and  Utikal  (1988a,  1988b)  have 
developed  consistent  goodness-of-fit  tests  for  Cox’s  model,  Aalen's  additive  risk  model  and  the 
general  proportional  hazards  model  against  the  alternative  of  the  general  nonparametric  model 

(3). 

Consider  testing  the  null  hypothesis  Ho:  Cox’s  proportional  hazards  model  (1)  holds  over  the 
region  (t,  z)  €  [0,  l]2.  Under  Ho,  the  natural  estimator  of  A  is 

A(t,z)  =  k(t)  f  e^*dx, 

Jo 

where  f  and  A  are  defined  in  Section  3.1  and,  if  (T,,  Z,(T<))  falls  outside  [0,  l]2,  the  survival  time 
Ti  is  regarded  as  being  censored  (i.e.  8,  is  set  to  0).  The  Kolmogorov-Smirnov  type  test  statistic 
T M  =  \/nsup(tiI)e|01]3  | A(t,z)  -  A(t,  z)|  could  be  used  for  testing  H0-  The  following  result 
provides  a  way  of  determining  (based  on  simulation)  an  appropriate  critical  region  for  T <**) .  Define 
SM(0,t)  =n~1  5^r=1  *(0^(0  ^(0  ^  Zi[ 0  <  1)  efiZi^t\  and  define  the  quantities  s^3\  E  etc.  of 
Section  3.1  in  terms  of  this  SU) . 

Theorem  3.5.  (McKeague  and  Utikal,  1988b)  Suppose  that  the  conditions  of  Theorems  3.1  and  3.4 
hold.  Then,  under  H0,  y/n(A  —  A)^*m'  in  Da,  where 


■(M)  =  l  -  *W  jTjf ‘ 

'c(1,2)/o  L 


r/  X  Xo(u)e^° 
h(u’x)  =  1 -  t 

fz(u)Y  («)(*,!) 


g(u,  x)  =  A0(ti)  e^°*/2(u)  y  („)  (z,  1), 

6(z)  =  f  tPoXdx, 

Jo 

c(t,z)  =  E_1(A0(t)  J  xe^oXdx  —  b(z)  J  e(/30,u)  A0(u)du). 


Proof.  (Sketch)  Using  a  Taylor  series  expansion  of  t?*  about  ft  =  fo  and  the  representation  of 
y/n(k  —  Ao)  given  by  Andersen  and  Gill  (1982,  (2.8)),  it  can  be  shown  easily  that  y/n(A  —  A)  is 
asymptotically  equivalent  in  distribution  to  the  process  M ^  (t,  z)-6(z)  Mo(t)-c(t,  z)  Mi(l),  where 
is  defined  by  (3.10),  Mi(t)  =  n->U(#>>t)  and  Mo (t)  is  the  martingale  part  of  y/njA  -  Ao) 
given  by 


Using  Rebolledo’s  martingale  central  limit  theorem  (see  the  proof  of  Theorem  3.4  of  Andersen  and 
Gill,  1982)  it  can  be  shown  that  (Mo,Mi)-^>(mo,mi)  jointly  in  D[0,  l]3,  where  mo  and  mj  are  the 
independent  Gaussian  martingales  defined  in  Theorem  3.2.  The  key  step  in  the  proof  is  to  see  that 
(mo,  mi)  can  be  represented  in  terms  of  a  single  Brownian  sheet  process  W: 


-"-urn-** 

Then,  using  Rebolledo’s  martingale  central  limit  theorem  again,  and  also  using  Theorem  3.4,  it 
can  be  shown  that  (M,Mo,Mi)A(m,mo,mi)  jointly  in  x  Z)[0,  l]2.  Applying  the  continuous 
mapping  theorem,  we  obtain  y/n(A  —  -  bmo  -  emi(l)  =  m’.  This  completes  the  proof. 
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